Accampaning Book:
Covered Chapters: Ch. 1 (1.1-1.4), Ch. ???
13 Juli 2018
Accampaning Book:
Covered Chapters: Ch. 1 (1.1-1.4), Ch. ???
Functional Data
From Raw to Functional Data
The simplest data set encountered in functional data analysis is a sample of the form: \[
\begin{align}
x_i(t_j),\quad t_j\in[a,b],\quad i=1,\dots,n \quad j=1\dots,J.
\end{align}
\] The (theoretical) objects we wish to study are smooth curves: \[
\begin{align}
X_i(t),\quad t\in[a,b],\quad i=1,\dots,n \quad j=1\dots,J.
\end{align}
\]
Basis expansions:
Typically, the first step in working with functional data is to express them by means of a basis expansion \[ X_i(t)\approx\sum_{m=1}^M c_{im} B_m(t),\quad t\in[a,b], \] where \(B_1(t),\dots,B_M(t)\) are some standard collection of basis functions like:
B-spline basis functions \(B_1(t),\dots,B_M(t)\):
bspl_bf <- create.bspline.basis(rangeval=c(0,1), norder=3,
breaks=seq(0,1,len=7))
plot(bspl_bf, main="B-spline Basis Functions", xlab="[a,b]=[0,1]")
Fourier basis functions \(B_1(t),\dots,B_M(t)\):
fourier_bf <- create.fourier.basis(rangeval=c(0,1), nbasis=5) plot(fourier_bf, main="Fourier Basis Functions", xlab="[a,b]=[0,1]")
Example: Raw data
matplot(x=growth$age, y=growth$hgtf[,1:5], type="p", lty=1, pch=21,
xlab="age", ylab="height (cm)", cex.lab=1.2,col=1:5,bg=1:5,
main="5 Girls in Berkeley Growth Study")
Example: Pre-processed data
SmObj <- smooth.basisPar(growth$age,y=growth$hgtf[,1:5],lam=0.1)
plot(SmObj$fd, xlab="age", ylab="height (cm)", cex.lab=1.2,
main="5 Girls in Berkeley Growth Study", lty=1)
## [1] "done"
Example: Pre-processed data - 1st derivative
plot(deriv(SmObj$fd, 1), xlab="age",
ylab="growth rate (cm / year)", cex.lab=1.2,
main="5 Girls in Berkeley Growth Study (1st derivative)", lty=1)
## [1] "done"
Example: Pre-processed data - 2nd derivative
plot(deriv(SmObj$fd, 2), xlab="age", cex.lab=1.2,
ylab="growth acceleration (cm / year^2)",
main="5 Girls in Berkeley Growth Study (2nd derivative)", lty=1)
## [1] "done"
Sample Mean and Covariance
Pointwise mean: \[ \hat{\mu}_n(t)=\bar{X}_n(t)=\frac{1}{n}\sum_{i=1}^n X_i(t) \] Pointwise standard deviation: \[ S_n(t)=\sqrt{\frac{1}{n-1}\sum_{i=1}^n\Big(X_i(t)-\bar{X}_n(t)\Big)^2} \]
Pointwise mean and standard deviation:
Wiener.Mean <- rowMeans(Wiener.mat) Wiener.SD <- apply(Wiener.mat, 1, sd)
Pointwise covariance function: \[ \hat{c}_n(t,s)=\frac{1}{n-1}\sum_{i=1}^n\Big(X_i(t)-\bar{X}(t)\Big)\Big(X_i(s)-\bar{X}(s)\Big) \]
Pointwise covariance function:
Wiener.Cov <- var(Wiener.mat)
Principal Component Functions
Idea: Use the Estimated Functional Principal Components (EFPC's) \(\hat{v}_j\) as basis functions for representing the centered functions \(X_i-\bar{X}_n\): \[ X_i(t)-\bar{X}_n(t)\approx\sum_{j=1}^p\hat{\xi}_{ij}\hat{v}_j(t) \]
EFPC's are orthonormal, i.e. \[\textstyle\int_a^b\hat{v}_j(t)\hat{v}_k(t)dt=\begin{cases}1, &j=k\\0,&j\neq k.\end{cases}\]
\[
\begin{align}
\hat{c}_n(t,s)&\approx\sum_{j=1}^p\hat{\lambda}_j\hat{v}_j(t)\hat{v}_j(s),
\end{align}
\] where \(\hat{\lambda}_1>\dots >\hat{\lambda}_p\) denote the estimated eigenvalues.
\[
\begin{align}
X_i(t)&\approx\bar{X}_n(t) + \sum_{j=1}^p\hat{\xi}_{ij}\hat{v}_j(t)
\end{align}
\]
Case Study:
BOA Stock Returns
We study the daily cumulative log-return functions: \[R_i(t):=\log(P_i(t))-\log(P_i(0))\approx\frac{P_i(t)-P_i(0)}{P_i(0)}\]
# Data
BOA <- read.table("DataSets/BOA.txt",header=TRUE)
Dates <- dimnames(BOA)[[1]]
BOA <- data.matrix(BOA)
BOA <- BOA[-which(Dates=="08/26/2004"),]# Outlier
n <- dim(BOA)[1]
J <- dim(BOA)[2]
Times <- seq(0,6.5,length=J)
# Cumulative log-return functions (raw data)
log_BOA <- log(BOA) - matrix(log(BOA)[,1], nrow=n, ncol=J)
# B-spline basis functions
bspl_basis <- create.bspline.basis(rangeval=c(0,6.5),nord=4,nb=200)
# Cumulative log-return functions (with basis functions)
log_BOA_fd <- Data2fd(Times,t(log_BOA),basisobj = bspl_basis)
# Plot functional data
plot(log_BOA_fd, xlab="Trading Hours",ylab="",lwd=1, col=gray(.5),
main="Cumulative Log-Return Functions")
lines(log_BOA_fd[1:10],lwd=1.5, lty=1)
Plot of mean function for BOA cumulative returns. Point-wise 95% confidence intervals are included in red.
EFPC's of BOA cumulative returns versus the theoretical eigenfunctions of Brownian Motions.
Square Integrable Functions
What makes the space \(L^2([a,b])\) so convenient is its structure:
Two functions \(f\) and \(g\) are orthogonal if \[\langle f,g\rangle=0.\]
The norm \(||f||=\sqrt{\langle f,f\rangle}\), gives us a notion for the distance between two functions: \[d(f,g)=||f-g||\]
As we have already seen, basis expansions play an important role in methodology for functional data.
Random Functions
Let \(X\) denote a random function defined on a probability space, say \(\Omega\).
We assume that all realizations \(X(\omega)\), \(\omega\in\Omega\), are elements of the space \(L^2([a,b])\) of square integrable functions, i.e
\(||X(\omega)||=\sqrt{\int_a^b \big(X(\omega)(t)\big)^2dt} < \infty\) for all \(\omega\in\Omega\).
\(||X||\in\mathbb{R}\) is thus a real random variable.
If \(E(||X||^2)<\infty\), we say that the random function \(X\) is square integrable. Caution: Integration is here with respect to \(\omega\), not \(t\). It might be more pedagogical to say that the random function \(X\) has a finite second moment.
The covariance function leads to Functional Principal Component Analysis (FPCA), which allows us to represent a square integrable random function \(X\) as \[ \textstyle X(t)=\mu(t)+\sum_{j=1}^\infty\xi_j v_j(t), \]
The eigenfunctions \(v_j\) are the solutions of the eigen-equation \(\int_a^bc(t,s)v_j(s)ds=\lambda_jv_j(t)\).
\(\lambda_1,\lambda_2,\dots\) denote the eigenvalues.
The random variables \(\xi_j\) are called the scores \[ \textstyle \xi_j=\langle X-\mu,v_j\rangle=\int_a^b(X(t)-\mu(t))v_j(t)dt. \]
FPCA allows us to represent a square integrable random function \(X\) as (Karhunen-LoƩve expansion) \[ \textstyle X(t)=\mu(t)+\sum_{j=1}^\infty\xi_j v_j(t). \]
It can be shown that: \[ E(\xi_j)=0,\; V(\xi_j)=E(\xi_j^2)=\lambda_j, \; \operatorname{Cov}(\xi_j,\xi_k)=0,\,j\neq k, \] and \(E\int_a^b(X(t)-\mu(t))^2dt=E||X-\mu||^2=\sum_{j=1}^\infty\lambda_j\).
That is, \(\lambda_j\) is the variance of the random function \(X\) in the principal direction \(v_j\) and the sum of these variances is the total variance of \(X\).